02 - Text Analysis

University of San Francisco, MSMI-608

Outline

Pre-Class Code Assignment Instructions

In this semester, I am going to ask you to do a fair bit of work before coming to class. This will make our class time shorter, more manageable, and hopefully less boring.

I am also going to use this as an opportunity for you to directly earn grade points for your effort/labor, rather than “getting things right” on an exam.

Therefore, I will ask you to work through the posted slides on Canvas before class. Throughout the slides, I will post Pre-Class Questions for you to work through in R. These will look like this:

Pre-Class Q1

In R, please write code that will read in the .csv from Canvas called sf_listings_2312.csv. Assign this the name bnb.

You will then write your answer in a .r script:

Click to show code and output
# Q1
#bnb <- read.csv("sf_listings_2312.csv")

Important:

To earn full points, you need to organize your code correctly. Specifically, you need to:

  • Answer questions in order.
    • If you answer them out of order, just re-arrange the code after.
  • Preface each answer with a comment (# Q1/# Q2/# Q3) that indicates exactly which question you are answering.
    • Please just write the letter Q and the number in this comment.
  • Make sure your code runs on its own, on anyone’s computer.
    • To do this, I would always include rm(list = ls()) at the top of every .r script. This will clean everything from the environment, allowing you to see if this runs on my computer.

Handing this in:

  • You must submit this to Canvas before 9:00am on the day of class. Even if class starts at 10:00am that day, these are always due at 9:00.
  • You must submit this code as a .txt file. This is because Canvas cannot present .R files to me in SpeedGrader. To save as .txt:
    • Click File -> New File -> Text File
    • Copy and paste your completed code to that new text file.
    • Save the file as firstname_lastname_module.txt
      • For example, my file for Module 01 would be matt_meister_01.txt
      • My file for module 05 would be matt_meister_05.txt

Grading:

  • I will grade these for completion.
  • You will receive 1 point for every question you give an honest attempt to answer
  • Your grade will be the number of questions you answer, divided by the total number of questions.
    • This is why it is important that you number each answer with # Q1.
    • Any questions that are not numbered this way will be graded incomplete, because I can’t find them.
  • You will receive a 25% penalty for submitting these late.
  • I will post my solutions after class.

Text Analysis

Load in these packages. If you do not have them, you will need to install them.

  • e.g., install.packages("tidytext")
library(tidytext)
library(stringr)
library(dplyr)
library(ggplot2)
library(topicmodels)
library(tidyr)
library(Matrix)

Read in the Airbnb listings from last class (as bnb) as well as the reviews (on Canvas):

Click to show code and output
revs <- read.csv('sf_reviews_2312.csv')

We have spent a lot of time with numbers. We have even dabbled a bit in turning text into numbers. For example, whenever we have made dummy codes/indicator variables (e.g., for gender), we are taking words and turning them into numbers.

Text is an extremely useful form of data, especially for us as market researchers. However, it is not always obvious how to take text and turn it into something that we can test–or use to test other things.

In this module, I will briefly introduce four kinds of text analysis:

  1. Bag-of-words/sentiment
  2. Topic modeling
  3. Keywords
  4. Classification

Unfortunately, we do not have the time to go into great detail on any one of these. Doing so could be an entire class. Therefore, I suggest–if you are interested–looking online at the many, many blogs/walkthroughs you can find about these.

Join Reviews and Listings

We are going to analyze the text of the reviews we have for our Airbnb listings. To do so, it would be nice to have the listing information joined with each of the Airbnb snapshots. We can do this with a join function, from dplyr. There are multiple kinds of joins:

1. Inner Join (inner_join)

  • Description: Combines two datasets by returning only the rows with matching keys in both datasets.
  • Use case: Use this when you need data that exists in both datasets.
  • Example: inner_join(df1, df2, by = "key_column")

2. Left Join (left_join)

  • Description: Returns all rows from the first dataset (df1) and the matching rows from the second dataset (df2). If no match is found, NA is returned for columns from df2.
  • Use case: Use this when the focus is on keeping all rows from the left dataset.
  • Example: left_join(df1, df2, by = "key_column")

3. Right Join (right_join)

  • Description: Returns all rows from the second dataset (df2) and the matching rows from the first dataset (df1). If no match is found, NA is returned for columns from df1.
  • Use case: Use this when the focus is on keeping all rows from the right dataset.
  • Example: right_join(df1, df2, by = "key_column")

4. Full Join (full_join)

  • Description: Returns all rows from both datasets. Rows with no match in either dataset will have NA in the missing columns.
  • Use case: Use this when you want a complete merge of both datasets, keeping all rows regardless of matching.
  • Example: full_join(df1, df2, by = "key_column")

5. Semi Join (semi_join)

  • Description: Returns only the rows from the first dataset (df1) that have a match in the second dataset (df2). It does not add columns from df2.
  • Use case: Use this to filter rows in the first dataset based on matching keys in the second dataset.
  • Example: semi_join(df1, df2, by = "key_column")

6. Anti Join (anti_join)

  • Description: Returns only the rows from the first dataset (df1) that do not have a match in the second dataset (df2).
  • Use case: Use this to find rows in the first dataset that have no match in the second dataset.
  • Example: anti_join(df1, df2, by = "key_column")

We will use inner_join(), because I want to only use reviews for which we have listing information.

In our case, the key column is the listing id. Unfortunately, this is called id in the bnb data frame, and listing_id in the revs data frame. I think the easiest thing is to create a new variable in bnb:

Pre-Class Q1

How can we join revs and bnb?

Click to show code and output
bnb$listing_id <- bnb$id
revs <- inner_join(revs, bnb, by = 'listing_id', suffix = c('_revs', '_listing'))
  • suffix = tells R what to put at the end of each column that is in both data frames. This tells us where something came from if it is duplicated.

Cleaning Text

Why?

Text data is often messy, and has a lot of elements we can’t use, such as punctuation, numbers, extra spaces, and special characters. These can obscure underlying patterns, and/or make the data impossible to use. Cleaning the text helps:

  • Improve the quality of analysis.
  • Ensure uniformity in the text (e.g., all lowercase).
  • Prepare the text for further processing, such as tokenization or sentiment analysis.

How?

  1. Lowercasing: Ensures uniformity by converting all text to lowercase.
  2. Removing punctuation: Strips out special characters that don’t contribute to the meaning of the text.
  3. Removing numbers: Eliminates numeric characters if they are not relevant.
  4. Removing stopwords: Excludes common words (e.g., “and,” “the”) that don’t add much value.
  5. Trimming whitespace: Removes extra spaces for clean formatting.

Example

The text in revs is contained in the column comments. Below, we will complete each step mentioned above.

Pre-Class Q2

Lowercasing:

Click to show code and output
revs$clean_text <- tolower(revs$comments)

Pre-Class Q3

Remove punctuation:
Click to show code and output
revs$clean_text <- str_remove_all(revs$clean_text, "[[:punct:]]")

Pre-Class Q4

Remove numbers:
Click to show code and output
revs$clean_text <- str_remove_all(revs$clean_text, "\\d*")

Pre-Class Q5

Remove stopwords:
Click to show code and output
revs$clean_text <- str_remove_all(revs$clean_text,
                                  paste(stop_words$word, collapse = "\\b|\\b"))
  • When we loaded tidytext, it also loaded the data frame stop_words in the background.
  • This contains a bunch of very common words, which don’t add much.

What’s the \\b?

The \\b ensures that the pattern matches whole words only, rather than substrings inside larger words.

In our example:

str_remove_all(revs$clean_text, paste(stop_words$word, collapse = "\\b|\\b"))

The \\b ensures that stopwords like “a” or “the” are removed only when they are standalone words, not parts of other words.

  • The word “a” will be removed from “we found a rat in the fridge”, but the letter “a” will not
    • Result: “we found rat in the fridge”
  • This prevents accidental removal of parts of words, ensuring cleaner and more accurate text processing.

Pre-Class Q6

Remove whitespace:

Click to show code and output
revs$clean_text <- str_squish(revs$clean_text)

Purpose of Each Step

  1. Convert to Lowercase (tolower()):
    • Text normalization to treat “Hello” and “hello” as the same.
  2. Remove Punctuation (str_remove_all("[[:punct:]]")):
    • Strips out symbols like !, ?, or . that don’t carry semantic meaning.
  3. Remove Numbers (str_remove_all("\\d*")):
    • Gets rid of numeric values that may not be relevant to the analysis.
  4. Remove Stopwords (str_remove_all):
    • Uses a pre-defined list of stopwords from the tidytext package.
  5. Trim Extra Whitespace (str_squish):
    • Cleans up any remaining extra spaces for neatness.

Outcome

Here’s what the updated revs data looks like:

Click to show code and output
head(revs[,c('comments', 'clean_text')])
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  comments
1                                                                                The bad: Overall felt like a college dorm. The room was very small. the common areas dingy and the carpet and walls dirty. There’s a mildew smell on the air mixed with the many spices of everyone’s dinner. With no air conditioning or fan you have to sleep with the windows open, but the street noise is bad. The front door is an iron behemoth that shakes the whole place with a boom every time it closes. The good: convenient location and good communication
2                                                                                                                                                                                                                                                                                          We really liked Lily's place! It was very clean and practical for our visit to San Francisco. The location is good, specially if you have a car. Heads-up, don’t park on the side walk. Although we parked exactly where Lily told us, we got a parking ticket.
3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                A good place to crash in a great location
4 My partner and I were in SF for three days and feel like this hostel was well worth the price! The bathroom and showers were always clean and its location right outside of Chinatown was so convenient. The front desk was very accommodating and helpful with the check-in and check-out process, even offering to hold our bags for a few hours for the latter. The surrounding area generally felt safe, with my partner and I taking walks around the area until 9-10pm. Great experience overall! Highly recommend, especially given its location.
5                                                                                                                                                                                                                                                                                                                                                                                                            Couldn’t recommend this place enough! Richard and Bina we both so lovely, helpful, and went above and beyond on communication. Very grateful!
6                                                                                                                                                                                                                                                                                                                                                                                                                                          We had a great stay in Noe Valley! Hosts were so nice and eager to help with anything. Can’t beat the location!
                                                                                                                                                                                                                                                     clean_text
1                        bad college dorm common dingy carpet walls dirty mildew smell air mixed spices everyones dinner air conditioning fan sleep windows street noise bad front door iron behemoth shakes boom time closes convenient location communication
2                                                                                                                                       lilys clean practical visit san francisco location specially car headsup dont park walk parked lily told parking ticket
3                                                                                                                                                                                                                                                crash location
4 partner sf days feel hostel worth price bathroom showers clean location chinatown convenient front desk accommodating helpful checkin checkout process offering hold bags hours surrounding safe partner taking walks pm experience highly recommend location
5                                                                                                                                                                                           couldnt recommend richard bin lovely helpful communication grateful
6                                                                                                                                                                                                                stay noe valley hosts nice eager beat location

Here’s a 10-minute workshop outline for introducing Bag-of-Words (BoW) Analysis using tidyverse and tidytext, with a focus on sentiment analysis as the starting point. The goal is to give students hands-on experience with BoW and demonstrate its flexibility.

Bag of Words/Sentiment Analysis

What is Bag-of-Words?

BoW represents text as a collection of words, ignoring grammar and word order, focusing only on the presence (or frequency) of words. Effectively, this is identifying if a word exists in some text. For example, Did this reviewer say “rat”?

Advantages: - Simplicity and interpretability. - Versatile for various analyses. Disadvantages: - Ignores word order and context (e.g., sarcasm). - Can be improved with more sophisticated methods (e.g., word embeddings).

  • Use Cases:
    • Sentiment analysis.
    • Word frequency analysis.
    • Text classification and topic modeling.

Sentiment Analysis Example:

We will use the revs dataset to compute the average sentiment score of reviews. Sentiment analysis helps understand customer opinions. It is especially useful when we can’t quantify those opinions in other ways. With reviews, we usually have a star rating, but we actually don’t on Airbnb. So this will help fill in a gap.

Sentiment analysis is also a relatively simple application of BoW. It just assigns positive/negative labels to text.

Steps

  1. Tokenization: Break text into individual words.
  2. Join Sentiments: Match words with sentiment scores.
  3. Compute Scores: Summarize sentiment for each review.

Step 1: Tokenization

We tokenize with the function unnest_tokens() in R. The function takes a data frame, output, and input column. For us, the output is always going to be called “word”. To see how this works, let’s use a simple example. Remember, the text has been cleaned.

two_reviews <- data.frame(
  id = c(1,2),
  rating = c(2,5),
  clean_text = c("disgusting this place was full of rats", "beautiful rats")
)

two_reviews
  id rating                             clean_text
1  1      2 disgusting this place was full of rats
2  2      5                         beautiful rats
two_reviews_tokens <- unnest_tokens(two_reviews, word, clean_text)
two_reviews_tokens
  id rating       word
1  1      2 disgusting
2  1      2       this
3  1      2      place
4  1      2        was
5  1      2       full
6  1      2         of
7  1      2       rats
8  2      5  beautiful
9  2      5       rats

Step 2: Join Sentiments

To join sentiments, we have to get a data frame of words and sentiments from somewhere. Luckily, similar to the stop.words, there are also sentiment data frames that come with tidytext. We will use the bing lexicon, for no real reason. Feel free to try others.

bing_sentiments <- get_sentiments("bing")
head(bing_sentiments)
# A tibble: 6 × 2
  word       sentiment
  <chr>      <chr>    
1 2-faces    negative 
2 abnormal   negative 
3 abolish    negative 
4 abominable negative 
5 abominably negative 
6 abominate  negative 

bing_sentiments is a data frame of 6786 words, each with either a positive or negative tag. This is essentially our “bag” of words.

Using inner_join(), we can see how many positive and negative words are in each review. We can save this as two_reviews_bing.

two_reviews_tokens |>
  inner_join(bing_sentiments, by = "word") 
  id rating       word sentiment
1  1      2 disgusting  negative
2  2      5  beautiful  positive
two_reviews_tokens |>
  inner_join(bing_sentiments, by = "word") |>
  count(id, sentiment, sort = TRUE) 
  id sentiment n
1  1  negative 1
2  2  positive 1
two_reviews_bing <- two_reviews_tokens |>
  inner_join(bing_sentiments, by = "word") |>
  count(id, sentiment, sort = TRUE) 

Step 3: Summarize each review

two_reviews_tokens |>
  inner_join(bing_sentiments, by = "word") |>
  count(id, sentiment, sort = TRUE) |>
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) |>
  mutate(sentiment_score = positive - negative)
# A tibble: 2 × 4
     id negative positive sentiment_score
  <dbl>    <int>    <int>           <int>
1     1        1        0              -1
2     2        0        1               1

And save that summary as two_reviews_sentiment.

two_reviews_sentiment <- two_reviews_tokens |>
  inner_join(bing_sentiments, by = "word") |>
  count(id, sentiment, sort = TRUE) |>
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) |>
  mutate(sentiment_score = positive - negative)

Pre-Class Q7

Now, let’s do all of these steps for the revs data frame.

Perform Tokenization for revs. Save the result as revs_tokens.

Click to show code and output
revs_tokens <- revs |>
  unnest_tokens(word, clean_text)

Pre-Class Q8

Get sentiments and join them with revs_tokens. Save this as revs_tokens_bing.

Click to show code and output
bing_sentiments <- get_sentiments("bing")

revs_tokens_bing <- revs_tokens |>
  inner_join(bing_sentiments, by = "word") |>
  count(id, sentiment, sort = TRUE) 

revs_sentiment <- revs_tokens_bing |>
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) |>
  mutate(sentiment_score = positive - negative)

Pre-Class Q9

Summarize revs_tokens_bing. Save this as revs_sentiment.

Click to show code and output
revs_sentiment <- revs_tokens_bing |>
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) |>
  mutate(sentiment_score = positive - negative)

Pre-Class Q10

Join this back to revs, and save it as revs.

Click to show code and output
revs <- revs |>
  left_join(revs_sentiment, by = "id")

Pre-Class Q11

Visualize the distribution of sentiment.

More Bags, More Words

This is a super flexible tool. You can look for anything. But, there are more common things to do. For example, Word Frequency Analysis identifies the most common words in reviews.

Pre-Class Q12

Find the 10 most common words in revs.

Click to show code and output
word_frequencies <- revs_tokens |>
  count(word, sort = TRUE)

head(word_frequencies, 10)
          word    n
1         stay 2445
2           br 1382
3     location 1255
4        clean 1213
5         host  931
6         nice  801
7  comfortable  783
8    recommend  736
9         easy  716
10        home  657

Topic Modeling with LDA

Latent Dirichlet Allocation (LDA) is a powerful machine learning method for discovering hidden topics in text data. It groups words that appear together frequently, which you can then label, and use to classify text later. LDA also requires a decent amount of computing power, so I am not going to ask coding questions in this section. I will ask you to discuss my results.

What is Topic Modeling?

Topic modeling is an “unsupervised” machine learning technique that identifies hidden themes or topics in a collection of documents. Unsupervised just means that there is no set outcome. There is no dependent variable. Topic modeling will just show what words appear together often.

You can use this to identify key themes, or to classify text into topics. For example: Airbnb reviews that talk about rats vs Airbnb reviews that talk about crime. The key difference between using topic modeling and bag-of-words for classification is that topic modeling should pick up context better than bag-of-words.

Advantages:

  • Unsupervised: No need for labeled data.
  • Interpretable: Provides human-readable results.

Disadvantages:

  • Requires careful preprocessing.
  • May struggle with small datasets or very sparse data.

Implementation

To extract topics, we fit an LDA model to the “document-term matrix”. Specifically, we tell LDA how many topics we want with k =. I am going to ask for four.

The document-term matrix is similar to our data frame of tokens, but is summarized (somewhat) at the document level. For us, the document is each review, the term is each word. This document-term matrix tells us, for every word in all reviews, if the word is in that specific review.

Click to show code and output
# Convert to document-term matrix
revs_tokens <- revs |>
  unnest_tokens(word, clean_text) |>
  count(id, word, sort = TRUE) |> # Count word frequencies by review
  ungroup()
dtm <- cast_dtm(revs_tokens, document = id, term = word, value = n)

lda_model <- LDA(dtm, k = 4, control = list(seed = 123))

Once this has run, we can identify the most important words for each topic. I am going to show 10.

Click to show code and output
# Extract top words for each topic
topics <- tidy(lda_model, matrix = "beta")

top_terms <- topics |>
  group_by(topic) |>
  slice_max(beta, n = 10) |>
  arrange(-beta) |>
  ungroup() |>
  mutate(term = factor(term))

# View top words
ggplot(top_terms, aes(x = term, y = beta, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free_y") +
  coord_flip() +
  labs(
    title = "Top 10 Terms for Each Topic",
    x = "Terms",
    y = "Beta Value (Probability)"
  ) +
  theme_minimal()

From these results, we would have to interpret the topics ourselves.

Pre-Class Q12

How would you label each of these four topics? Is there an obvious distinction between them?

Pre-Class Q13

We have missed something important in our pre-processing here. What do you notice about the words in each topic that we have missed? Alternatively, do you think we should fix this?

Keyword Analysis with TF-IDF

TF-IDF (Term Frequency-Inverse Document Frequency) is a technique for identifying keywords that are unique to individual documents. This can be useful to distinguish keywords that tell us something meaningful from words that are merely used often.

What is TF-IDF?

TF-IDF measures the importance of a word in a document relative to all of our documents (which is called the corpus). - Formula:
[(t, d) = (t, d) (t)]
- Term Frequency (TF): Frequency of the word (t) in document (d). - Inverse Document Frequency (IDF): Logarithm of the ratio of total documents to the number of documents containing (t). - This highlights words that are frequent in one document but rare across others, identifying keywords unique to each document.

Advantages:

  • Balances local and global importance of words.
  • Useful for keyword extraction, document summarization, and search.

Disadvantages:

  • Does not consider word context or semantics.
  • Sensitive to noisy or sparse data.

Implementation

Keyword analysis at the level of each review can be helpful, but it is potentially more useful to do this at the neighborhood level, with neighbourhood_cleansed.

To prepare the data for TF-IDF analysis, we have to tokenize and count word frequencies. We will do that within neighbourhood_cleansed.

Click to show code and output
revs_tokens <- revs |>
  unnest_tokens(word, clean_text) |>
  count(neighbourhood_cleansed, word, sort = TRUE)

# Total word counts per document (for term frequency calculation)
revs_tokens <- revs_tokens |>
  group_by(neighbourhood_cleansed) |>
  mutate(total_words = sum(n)) |>
  ungroup()

Then, we can use functions from the tidytext package to calculate the TF-IDF scores for each word in each neighborhood. In this code, I am also going to only show five keywords per neighborhood.

Click to show code and output
# Compute TF-IDF
tf_idf <- revs_tokens |>
  bind_tf_idf(term = word, document = neighbourhood_cleansed, n = n)

tf_idf <- tf_idf |>
  arrange(desc(tf_idf)) |>
  select(neighbourhood_cleansed, word, tf_idf) |>
  group_by(neighbourhood_cleansed) |>
  slice_max(tf_idf, n = 5) |>
  ungroup()

# Print top keywords
head(tf_idf, 20)
# A tibble: 20 × 3
   neighbourhood_cleansed word       tf_idf
   <chr>                  <chr>       <dbl>
 1 Bayview                tim       0.0102 
 2 Bayview                jian      0.00880
 3 Bayview                chris     0.00778
 4 Bayview                dongmei   0.00635
 5 Bayview                karl      0.00440
 6 Bernal Heights         bernal    0.0201 
 7 Bernal Heights         heights   0.00784
 8 Bernal Heights         tyler     0.00507
 9 Bernal Heights         ruben     0.00457
10 Bernal Heights         cortland  0.00442
11 Castro/Upper Market    castro    0.0210 
12 Castro/Upper Market    dolores   0.00725
13 Castro/Upper Market    todd      0.00554
14 Castro/Upper Market    ashish    0.00487
15 Castro/Upper Market    mission   0.00430
16 Chinatown              staff     0.0137 
17 Chinatown              motel     0.0127 
18 Chinatown              mary      0.0101 
19 Chinatown              gainengs  0.00920
20 Chinatown              chinatown 0.00658

What makes each neighborhood unique?

Keyword extraction is most useful in distinguishing things. For example, we might want to know what keywords in each neighborhood are different from others. Here is one way we can do that:

Pre-Class Q14

Step 1: Tokenize and count word frequencies by neighborhood

Click to show code and output
revs_tokens <- revs |> 
  unnest_tokens(word, clean_text) |> 
  count(neighbourhood_cleansed, word, sort = TRUE) |> 
  group_by(neighbourhood_cleansed) |> 
  mutate(total_words = sum(n)) |> 
  ungroup()

Pre-Class Q15

Step 2: Compute TF-IDF to identify terms

Click to show code and output
tf_idf <- revs_tokens |> 
  bind_tf_idf(term = word, document = neighbourhood_cleansed, n = n)

Pre-Class Q16

Step 3: Calculate “proportional frequencies” for comparison across neighborhoods. This shows how large of a proportion of all reviews in this neighborhood each word is, compared to the same proportion overall.

Click to show code and output
tf_idf <- tf_idf |> 
  mutate(proportion = n / total_words) |> 
  group_by(word) |> 
  mutate(avg_proportion = mean(proportion), 
         z_score = (proportion - avg_proportion) / sd(proportion)) |> 
  ungroup()

Pre-Class Q17

Step 4: Identify the top distinctive keyword for each neighborhood.

Click to show code and output
distinctive_keywords <- tf_idf |> 
  group_by(neighbourhood_cleansed) |> 
  slice_max(z_score, n = 1) |>  # Select most distinctive term
  arrange(neighbourhood_cleansed, desc(z_score)) |> 
  ungroup()

Pre-Class Q18

Step 5: Interpret the results. What do you notice about these neighborhoods, based on their key words?

Click to show code and output
print(distinctive_keywords)
# A tibble: 36 × 10
   neighbourhood_cleansed word            n total_words      tf   idf  tf_idf
   <chr>                  <chr>       <int>       <int>   <dbl> <dbl>   <dbl>
 1 Bayview                pretty          8        1694 0.00472 0.251 0.00119
 2 Bernal Heights         stores         10        3922 0.00255 0.492 0.00126
 3 Castro/Upper Market    heart          17        5155 0.00330 0.750 0.00247
 4 Chinatown              free            5         779 0.00642 0.405 0.00260
 5 Crocker Amazon         entire          2         899 0.00222 0.639 0.00142
 6 Diamond Heights        spacious        3          84 0.0357  0.150 0.00534
 7 Downtown/Civic Center  muy            19        3590 0.00529 0.539 0.00285
 8 Excelsior              smooth          3        2508 0.00120 0.944 0.00113
 9 Financial District     price           8        1008 0.00794 0.405 0.00322
10 Glen Park              appreciated     5         661 0.00756 0.216 0.00164
# ℹ 26 more rows
# ℹ 3 more variables: proportion <dbl>, avg_proportion <dbl>, z_score <dbl>

Pre-Class Q19

This is going to be hard to do. But, for two points, I want you to try to make a map that has each keyword plotted in its neighborhood.